Clustering the Normalized Compression Distance for Influenza Virus Data
Identifieur interne : 000F68 ( Main/Exploration ); précédent : 000F67; suivant : 000F69Clustering the Normalized Compression Distance for Influenza Virus Data
Auteurs : Kimihito Ito [Japon] ; Thomas Zeugmann [Japon] ; Yu Zhu [Japon]Source :
- Lecture Notes in Computer Science [ 0302-9743 ]
Abstract
Abstract: The present paper analyzes the usefulness of the normalized compression distance for the problem to cluster the hemagglutinin (HA) sequences of influenza virus data for the HA gene in dependence on the available compressors. Using the CompLearn Toolkit, the built-in compressors zlib and bzip2 are compared. Moreover, a comparison is made with respect to hierarchical and spectral clustering. For the hierarchical clustering, hclust from the R package is used, and the spectral clustering is done via the kLine algorithm proposed by Fischer and Poland (2004). Our results are very promising and show that one can obtain an (almost) perfect clustering. It turned out that the zlib compressor allowed for better results than the bzip2 compressor and, if all data are concerned, then hierarchical clustering is a bit better than spectral clustering via kLines.
Url:
DOI: 10.1007/978-3-642-12476-1_9
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 000E18
- to stream Istex, to step Curation: 000E18
- to stream Istex, to step Checkpoint: 000235
- to stream Main, to step Merge: 000F76
- to stream Main, to step Curation: 000F68
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Clustering the Normalized Compression Distance for Influenza Virus Data</title>
<author><name sortKey="Ito, Kimihito" sort="Ito, Kimihito" uniqKey="Ito K" first="Kimihito" last="Ito">Kimihito Ito</name>
</author>
<author><name sortKey="Zeugmann, Thomas" sort="Zeugmann, Thomas" uniqKey="Zeugmann T" first="Thomas" last="Zeugmann">Thomas Zeugmann</name>
</author>
<author><name sortKey="Zhu, Yu" sort="Zhu, Yu" uniqKey="Zhu Y" first="Yu" last="Zhu">Yu Zhu</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:F78AFAB7DE19FE4803D32A2D6BFCCB44795CFF08</idno>
<date when="2010" year="2010">2010</date>
<idno type="doi">10.1007/978-3-642-12476-1_9</idno>
<idno type="url">https://api.istex.fr/ark:/67375/HCB-2MQPN5CV-Q/fulltext.pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000E18</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">000E18</idno>
<idno type="wicri:Area/Istex/Curation">000E18</idno>
<idno type="wicri:Area/Istex/Checkpoint">000235</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">000235</idno>
<idno type="wicri:doubleKey">0302-9743:2010:Ito K:clustering:the:normalized</idno>
<idno type="wicri:Area/Main/Merge">000F76</idno>
<idno type="wicri:Area/Main/Curation">000F68</idno>
<idno type="wicri:Area/Main/Exploration">000F68</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Clustering the Normalized Compression Distance for Influenza Virus Data</title>
<author><name sortKey="Ito, Kimihito" sort="Ito, Kimihito" uniqKey="Ito K" first="Kimihito" last="Ito">Kimihito Ito</name>
<affiliation wicri:level="1"><country xml:lang="fr">Japon</country>
<wicri:regionArea>Research Center for Zoonosis Control, Hokkaido University, N-20, W-10 Kita-ku, 001-0020, Sapporo</wicri:regionArea>
<wicri:noRegion>Sapporo</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Japon</country>
</affiliation>
</author>
<author><name sortKey="Zeugmann, Thomas" sort="Zeugmann, Thomas" uniqKey="Zeugmann T" first="Thomas" last="Zeugmann">Thomas Zeugmann</name>
<affiliation wicri:level="1"><country xml:lang="fr">Japon</country>
<wicri:regionArea>Division of Computer Science, Hokkaido University, N-14, W-9, Sapporo, 060-0814</wicri:regionArea>
<wicri:noRegion>060-0814</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Japon</country>
</affiliation>
</author>
<author><name sortKey="Zhu, Yu" sort="Zhu, Yu" uniqKey="Zhu Y" first="Yu" last="Zhu">Yu Zhu</name>
<affiliation wicri:level="1"><country xml:lang="fr">Japon</country>
<wicri:regionArea>Division of Computer Science, Hokkaido University, N-14, W-9, Sapporo, 060-0814</wicri:regionArea>
<wicri:noRegion>060-0814</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Japon</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s" type="main" xml:lang="en">Lecture Notes in Computer Science</title>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: The present paper analyzes the usefulness of the normalized compression distance for the problem to cluster the hemagglutinin (HA) sequences of influenza virus data for the HA gene in dependence on the available compressors. Using the CompLearn Toolkit, the built-in compressors zlib and bzip2 are compared. Moreover, a comparison is made with respect to hierarchical and spectral clustering. For the hierarchical clustering, hclust from the R package is used, and the spectral clustering is done via the kLine algorithm proposed by Fischer and Poland (2004). Our results are very promising and show that one can obtain an (almost) perfect clustering. It turned out that the zlib compressor allowed for better results than the bzip2 compressor and, if all data are concerned, then hierarchical clustering is a bit better than spectral clustering via kLines.</div>
</front>
</TEI>
<affiliations><list><country><li>Japon</li>
</country>
</list>
<tree><country name="Japon"><noRegion><name sortKey="Ito, Kimihito" sort="Ito, Kimihito" uniqKey="Ito K" first="Kimihito" last="Ito">Kimihito Ito</name>
</noRegion>
<name sortKey="Ito, Kimihito" sort="Ito, Kimihito" uniqKey="Ito K" first="Kimihito" last="Ito">Kimihito Ito</name>
<name sortKey="Zeugmann, Thomas" sort="Zeugmann, Thomas" uniqKey="Zeugmann T" first="Thomas" last="Zeugmann">Thomas Zeugmann</name>
<name sortKey="Zeugmann, Thomas" sort="Zeugmann, Thomas" uniqKey="Zeugmann T" first="Thomas" last="Zeugmann">Thomas Zeugmann</name>
<name sortKey="Zhu, Yu" sort="Zhu, Yu" uniqKey="Zhu Y" first="Yu" last="Zhu">Yu Zhu</name>
<name sortKey="Zhu, Yu" sort="Zhu, Yu" uniqKey="Zhu Y" first="Yu" last="Zhu">Yu Zhu</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Sante/explor/H2N2V1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000F68 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000F68 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Sante |area= H2N2V1 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:F78AFAB7DE19FE4803D32A2D6BFCCB44795CFF08 |texte= Clustering the Normalized Compression Distance for Influenza Virus Data }}
This area was generated with Dilib version V0.6.33. |